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Abstract 

In this paper we make a first attempt at understanding how to build 
an optimal approximate normal factor analysis model. The criterion we 
have chosen to evaluate the distance between different models is the I- 
divergence between the corresponding normal laws. The algorithm that 
we propose for the construction of the best approximation is of an the 
alternating minimization kind. 

1 Introduction 

Factor analysis, in its original formulation, is the linear statistical model 

Y = HX + e (1.1) 

where H is a deterministic matrix, X and e independent random vectors, the 
first with dimension smaller than Y, the second with independent components. 
What makes this model attractive in applied research is the data reduction 
mechanism built in it. A large number of observed variables Y are explained in 
terms of a small number of unobserved (latent) variables X perturbed by the 
independent noise e. Under normality assumptions, which are the rule in the 
standard theory, all the laws of the model are specified by covariance matrices. 
More precisely, assume that X and e are zero mean independent normal vectors 
with Cov(JT) = P and Cov(e) = D. where D is diagonal. It follows from (|l.ip 
that Cov(r) = HPH T + D. 

Building a factor analysis model of the observed data requires the solution 
of a difficult algebraic problem. Given So, the covariance matrix of Y , find the 
triples (H, P, D) such that Eo = HPH T + D. Due to the structural constraint 
on D, which is assumed to be diagonal, the existence and unicity of a factor 
analysis model are not guaranteed. As it turns out, the right tools to deal with 
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this situation come from the theory of stochastic realization, see [5j (trying to 
spot the master's hand) for an early contribution on the subject. 

In the present paper we make a first attempt at understanding how to build 
an optimal approximate factor analysis model. The criterion we have chosen 
to evaluate the distance between covariances is the I-divergence between the 
corresponding normal laws. The algorithm that we propose for the construction 
of the best approximation is inspired by the alternating minimization procedure 
of g] and [6J. 

2 The model 

Consider two independent, zero mean, normal vectors X and e of respective 
dimensions k and n. We will assume that Cov(X) — I, the identity matrix, and 
Cov(e) = D > 0, a diagonal matrix. Let H be an n x k matrix (in this paper 
k < n) and let the random vector Y be defined by 

Y = HX + e. (2.1) 

Under these assumptions (12. ip is called a factor analysis (FA) model of size k 
for the vector Y. Notice that allowing Cov(X) = P > does not produce a 
more general model, as a square root of P can always be absorbed in H. We 
will say that a normal vector Y admits a FA model of size k if it is equal in 
distribution to HX + e for some X and e as above, i.e. if its covariance S 
can be written as So = HH T + D. Not every normal vector Y admits a FA 
model, the hard constraint being imposed by the diagonal structure of D. A 
probabilistic interpretation stems from Cov(F|A) = D (see equation (IA.1[) of 
the Appendix) i.e. the n components of Y are conditionally independent given 
the k < n components of some vector X. In Remark [33] of the next section the 
condition for the existence of a FA model is slightly reformulated. 
Although the construction of an exact FA model is not always possible, one can 
search for a best approximate model, according to some criterion. In this pa- 
per we opt for minimizing the I-divergence (Kullback-Leibler distance) between 
normal laws. Recall that given two probability measures Pi and P2, defined 
on the same measurable space, such that Pi <C P2, the I-divergence of Pi with 
respect to P2 is defined as 

^(P 1 ||P 2 )=E Pl log^i. 

If Pi and P2 are normal measures on the same space R.", with zero means and 
strictly positive covariance matrices Si and S2 respectively, the I-divergence 
Z)(Pi I IP2) takes the explicit form 

£>(Px||P a ) = ~ log jgj + \ tr(S 2 - 1 Si) - \. (2.2) 

Since the I-divergence only depends on the covariance matrices, we usually write 
D(Si||S 2 ) instead of D(Pi||P 2 ). 
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The approximate factor analysis problem can be posed as follows: 

Problem 2.1. Given the positive covariance matrix So £ W nxn and the integer 
k < n minimize 

D(X a \\HH T + D)= l - log lHH ^ Dl + \ K((HH T + - \. 

over all pairs (H, D) where H s R nxfe and D > is of size n and diagonal. 

Notice that £)(£i||£2), computed as in (|2.2p , can be considered as a divergence 
between two positive definite matrices, without referring to normal distributions. 
Hence Problem 12 . II also has a meaning, when one refrains from assumptions like 
normality. 

Existence of the minimum is guaranteed by the following 

Proposition 2.2. There exist matrices H* £ M. nxk and D* > of size n and 
diagonal minimizing the I-divergence in Problem \2.1\ 

The proof is deferred to section 272) since it uses later results. 
In order to construct an algorithm for the solution of Problem l2.1l we will imitate 
the approach of 6 . The algorithm will therefore be derived by a relaxation tech- 
nique, lifting the original minimization problem to a higher dimensional space. 
In the larger space a double minimization problem equivalent to Problem 12.11 
can be formulated, leading in a natural way to an alternating minimization 
algorithm. 



3 Lifting of the original problem 



In this section we will embed Problem l2.1l into a higher dimensional space. First 
we introduce the relevant sets of covariances. Given k < n we denote by 

(n+k)x(n+k) . yi _ / ^11 ^12 



S = {EeM ( " +fe,x(n+fe) :S= [^ LL Z >0}. (3.1) 

\^21 ^22 ) 

where En is n x n. Two subsets of S will play a special role. 

S = {SeS:S 11 =E }. (3.2) 
where So is a given covariance. We also consider the subset 

Sl = {S e S:E =(™;| T C (3.3) 

where H G R nxk ,Q € R kxk invertible, D > diagonal. Elements of Si will 
often be denoted by D, Q). 
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Remark 3.1. Notice that a normal vector Y, with Cov(F) = So, admits a FA 
model of size k iff SoflSi / 0. Supposing that this is the case, take E £ SoClEi 
then, for some (H,D,Q), one has 



E = 



( 



S 



T 



■) 



( 



(HQ) 



HQ 

Q T Q 



■) 



> 0. 



is a bonafide covariance of a normal vector 1/ of dimension n + k. Partition 
V T = (y T , Z T Y . It is easy to verify that Cov(y) = E = HH T + D is the 
same as Cov(i7X + e) for some X standard normal and e normal, independent 
from X, and with diagonal covariance D. 

The lifted minimization problem can be posed as follows 
Problem 3.2. 



which can be viewed as an iterated minimization problem over each of the 
variables. The two resulting partial minimization problems will be investigated 
in the following sections. In section T3.3I we will show the connection between 
Problems 12.11 and 13.21 More precisely, we will prove 

Proposition 3.3. Let So be given. It holds that 

mm D(Z \\HH T + D) = min DfE'lEA 
h,d s'eEc&eE! 

3.1 The first partial minimization problem 

In this section we consider the first of the two partial minimization problems. 
Here we minimize, for a given positive definite matrix E € E, the divergence 
.D(E'||E) over E' £ So- The unique solution to this problem can be computed 
analytically and follows from 

Lemma 3.4. Let (Y,X) be a random vector distributed according to some Q = 
Q ' X and let T 3 the set of all distributions P = P Y ' X whose marginal P Y = Pq. 
for some fixed P < Q Y . Then mm pe -p D{P\\Q) = D(P*\\Q) where P* is 
given by the Radon- Nikodym derivative 

dP* dP 



E'eS 0: EieE 



mm 



D(E , ||Ei) 



dQ 



dQ Y ' 




(3.4) 



D{P\\Q) =D{P\\P*)+D{P*\\Q). 



(3.5) 
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Proof First we show that (|3.4[) holds. Recall that Y has law P$ under P* ; then 
P(P*| \Q) = E P . log — = E P . log — 1 = E Pn log — 1 - D(P \ \Q Y ). 



To show that P* is a minimizer it is clearly sufficient to prove that (13 . 5|) holds. 
D(P||Q)=E P bgi£ + Eplog^ 
= £>(P||P*)+E F log— f 



dPo 
dQ^ 



D(P\\P*)+E Po log— £ = D(P||P*) +£>(P*||Q), 



where we used the fact that all P € P have marginal P = p>. □ 

Remark 3.5. The law P* is easily characterized in terms of the problem data Po 
and Q noticing that the marginal P* Y = Pq and the conditional P* X \ Y = Q X \ Y . 



We now apply Lemma YdM to the case of normal laws and solve the first partial 
minimization. See also [5| for a different proof. 

Proposition 3.6. Let Q and Po be zero mean normal laws with strictly positive 
covariances E G £ and Eo G M™ x " respectively. Then, min s , e ^ o P(E'[|E) is 
attained by the zero mean normal law P* with covariance 

E* = ( S ° E E 11 1 Ei2 \ > Q 

\ v E2iE 11 Eo E22 — E2iE 1:L (En — Eo)E n Ei2y 

Moreover, 

P(E*||E) = P(Eo||En). 
Proof This follows from Remark 13.51 A direct computation gives 

E^ 2 = Ep«xr T = Ep.(E P «[x|y]r T ) 

= E P ,-(E Q [X\Y]Y T )= Ep« (E 2 iE^ 1 1 yy T ) 
= E 2 iE n 1 Ep o rr T = EaiE^Eo. 

Likewise, we have 

EJ 2 = E P ,XX T = Covp.(X) 

= Covp*(X|F) +Ep*(E P , [X\Y]Ep,[X\Y] T ) 
= Cov Q (X\Y) + E P ,{E Q [X\Y}E Q [X\Y} T ) 
= E 22 - EaiEu S12 +Ep.(E 2 iE n 1 r(S 2 iSr 1 1 r) T ) 
= E 22 - EaiEu^u +Ep (E 2 iE n 1 rr T Er 1 1 Ei 2 ) 
= S22 — E2iE n E12 + E2iE lx EqEh E12. 
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Notice that, since E > by assumption, 

22 - ^21 1^11 J ^12 — ^22 - ^21^n ^12 > U 

which, together with the assumption Eq > 0, shows that E* > 0. 

The last relation, D(E*||E) = D(E ||En), reflects equation O- □ 

3.2 The second partial minimization problem 

In this section we turn to the second partial minimization problem. Here we 
minimize, for a given positive definite matrix E e S, the divergence P^(E||Ei) 
over Ei G Si. 

Clearly this problem cannot have a unique solution in terms of the matrices H 
and Q. Indeed, if U is a unitary k x k matrix and H' = HU, Q' = U T Q, then 
H'H' T = HH T , Q' T Q' = Q T Q and H'Q' = HQ. Nevertheless, the optimal 
matrices HH T , HQ and Q T Q are unique, as we will see in Proposition 13.71 
First we need to introduce some notation and conventions. If P is a positive 
definite matrix, we denote by P 1 / 2 any matrix satisfying (P 1 / 2 ) T (P 1 / 2 ) = P, 
and by P -1 / 2 its inverse. If M is any square matrix, we denote by A(M) the 
diagonal matrix 

A(Af)« = M u . 

Recall that we denote by E(iJ, D, Q) a typical element of Si. 

Proposition 3.7. Given E G S the min s £ ^ £)(S||Ei) is attained at a E^ 
such that Ei G Si is solved by 



Ms — ^22 ' 

H* = Ei 2 E 22 



-1/2 



P* = A(En-Ei 2 E 22 1 E 2 i). 

Thus the minimizing matrix E| = Y,(H* , D* ,Q*) becomes 

_ / / ^i2S 22 1 E2i + A(En — Ei2E 22 1 E2i) Ei 2 \ (3 6) 

\ S 2 i E 22 y 

Moreover, the Pythagorean law 

Dm\E(H,D,Q))=D(E\\Zl) + D(Z$\\Z(H,D,Q)) (3.7) 

holds for any E(P, D, Q) G Si, and therefore T,\ is unique. 

Proof It is sufficient to show the validity of (|3.7p . We first compute 

2P(E||E(P,P,Q))-2P(E||Et). 

It follows from Lemma A ED that \S(H,D,Q)\ = \D\ x |Q T Q|. In view of 
equation (|2.2p the above difference becomes 

log \D\ + log |Q T Q| - log \D* | - log \Q* T Q*\ + tr(E(P, D, Q^E) - tr(Ej- x E) . 

(3.8) 
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Using Corollary A IA.21 we compute 

V(H,D,Q)~ 1 =[_q-? h t d -i Q-\H^D-^H + I)Q- T ) ' (3 - 9) 

and hence we get that 

tr(E(#, D, Q) _1 E) = tr^-^Sn - HQ- T ^ 21 )) 
+ tr( - Q- 1 H T D- 1 J: 12 + Q- l {H T D- x H + 7)Q- T E 22 ) 
= tr(U- 1 (E n - 2i/Q- T E 21 ) + Q-^F 7 D^H + 7)Q- T E 22 ). (3.10) 

Apply now Lemma A lA.ll to (|3.6p and write A = A(En — Ei 2 E^ 2 1 E 2 i), to get 

= Sfif* I?* = f ^ 1 i 1 Ei 2 E 2 2 

1 " \~E 22 S 2 iA 1 S 22 E 2 iA 1 Ei 2 E 22 + E 2 

Therefore 

tr(Si _1 E) = tr(A _1 x (En - Ei 2 E 22 1 E 2 i)) + tr/ fe = tr(A -1 A) + k = n + k. 

(3.11) 

Combining equations (|3.8[) . (|3. 10[) . and (|3.11[) . we find that 

D(E\\E(H,D,Q)) - D(E\\Et) = 

\og\D\+\og\Q T Q\-\og\D*\-log\Q* T Q*\ 
+ tr(£- 1 (E 11 -HQ- T Z 21 )) 

+ tr( - Q- 1 fl- T U- 1 E ia + Q-\H T D- l H + /)Q~ T E 22 ) 

-(n + k). (3.12) 

We proceed with the computation of 2D(EJ ||E(JJ, L>, Q)). 

2D(E(£f*,D*,g*)||I!(ir,D,Q)) = 

log |D| + log |Q T Q| - log |£>* | - log \Q* T Q*\ -(n + k) 

+ tr(E(iT, D, Q)- 1 E(i/*, £)*, Q*)) . (3.13) 

Combining equations (|3"1)]) . (|33j) , and tr(D" 1 (Ei 2 E 2 " 2 1 E 2 i + A)) = tr(-D _1 E u )) 
we obtain 

tr(E(iT, D, Q)~ 1 E(ff*,£)* ! Q*)) = tr^En) 

- 2tr(£- 1 J ffQ- T E 21 ) + ti(Q-\H T D- l H + I)Q- T E 22 ). (3.14) 

Insertion of (|3.14[) into (|3 . 1 3f> and a comparison with (|3.12p yields the result. □ 

Remark 3.8. Notice that the matrix H*H* T is strictly dominated by En 
(in the sense of positive matrices). This easily follows from En — H*H* = 
En — Ei 2 E 22 1 E 2 i > 0, and the assumption E > 0. By the same token D* > 0. 
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3.3 The link to the original problem 

We now establish the connection between the lifted problem and the original 
Problem O 

Proof of Proposition HI Let S x = T,(H,D,Q) and denote by E* = E*(£ x ), 
the solution of the first partial minimization over So- We have, for all E' £ £o, 

£>(£'||£i) > D(E*||Ei) 

= D{H \\HH T + D) 

> mm D(T, \\HH T + D). 

H.D 

where we used Proposition 12.21 to write min on the RHS. It follows that 

inf £)(E'||Ei) > mmD(T, \\HH T + D). 
s'eEo.s^E! " h,d 

Conversely, let (H*,D*) be the minimizer of {H, D) i-> D{T, \\HH T + D), pick 
an arbitrary invertible Q*, and let E* = H(H*,D*,Q*) be the corresponding 
element in Ex. Furthermore, let E** G E be the minimizer of E i-> D(E||E*) 
over So. Then 

mm D(E \\H H T + D) = D(Y, \\H*H* T + D*) 

> D(E**||E*) 

> inf ^(E'HEi), 

E'gEo.EigE! 

which shows the opposite inequality. Finally, to show that we can replace the 
infima with minima also in the lifted problem, notice that (see Proposition 13 . 6| 
L>(£**||£*) = D{H \\H*H* T +D*). 

4 Alternating minimization algorithm 

In this section we combine the two partial minimization problems above to 
derive an iterative algorithm for Problem 12.11 It turns out that this algorithm 
is also instrumental in proving the existence of a solution to Problem 12.11 

4.1 The algorithm 

We suppose that the given matrix Eo is strictly positive definite. Pick the 
initial values Hq, Dq, Qq such that Hq is of full rank, Dq > is diagonal, Qq 
and HqHq + Dq are invertible. 

At the t-th iteration the matrices Ht, Dt and Qt are available. Start solving the 
first partial minimization problem with E = E(i7 t , D t ,Qt)- Use the resulting 
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matrix as data for the second partial minimization, the solution of which gives 
the update rules 



Qt+i = (QjQt - Qt Hj {H t Hj + D t y x R t 



+ QjHj(H t Hj + Dty^oiHtHj + D t )~ x H t Qtj ^ \ (4.1) 
H t+1 = E (H t Hj + D t y 1 H t Q t Qt+i, (4.2) 
Dt+i = A(S - H t+l Hj +1 ). (4.3) 

In (|4.ip there is some freedom in computing the square root that determines 
Qt+i- Properly choosing the square root will result in the disappearance of Qt 
from the algorithm. This is an attractive feature, since Qt only serves as an 
auxiliary variable. One can write the RHS of equation (|4.ip . before taking the 
square root, as 

Qj(I - Hj{H t Hj + D t )-\H t Hj +D t - ^ Q ){H t Hj + D t )- x H t )Q t 
and denoting 

R t =I- Hj{H t Hj + DtY^HtHj + D t - ^o){H t Hj + D t )~ 1 H t (4.4) 
a possible square root is given by 

R\ /2 Qf 

Notice that R t only involves the iterates H t and D t . The update equation (|4.1[) 
can therefore be rewritten as 

flt+i - E (H t H? + D t )- l H t R- 1/2 . (4.5) 

The final version of the algorithm is given by equations (|4.3[) . (|4.4|) . and (|4. 5[) 
which, for clarity, we present as 

Algorithm 4.1. 

Rt = I- Hj{H t Hj + D t )-\H t Hj + D t - ^o){H t Hj + D t y l H u (4.6) 

H t+ i = ^ Q {H t Hj + Dt^HtR; 1 ' 2 , (4.7) 

D t+1 = A(V -H t+1 Hj +1 ). (4.8) 

In order to avoid taking a square root at each step one can introduce the ma- 
trices K t = H t Qt and Pt = QfQt and write the updates for K t and Pt- Equa- 
tions P~Tj) . (|4~2|) . and (|4~3]) easily give 

Algorithm 4.2. 

Kt+i = ^{KtP^Kj + D t )- l K t , (4.9) 
P t +i =Pt- Kj{K t P^Kj + Dt)' 1 (KfPf 1 Kj - V ){K t Pi x Kj + D t )~ x K t , 
D t+1 =A(X Q -K t+1 P t -\Kj +1 ). 

After the final iteration, the T-th say, one can take Ht = KtQ^ 1 , where Qt is 
a square root of Pt. 
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Notice that in both Algorithm 14.11 and 14.21 it is required to invert n x n matri- 
ces (like e.g. (H t Hj + A) -1 ). Applying corollary A IA.2l one gets {H t Hj + 
D t y l H t = D^ 1 H t {I + Hj D^Ht). Hence, we can replace e.g. (|43j) with 

H t+1 = Z D^H t (I + HjD^H t )- l K^ 1 ' 2 . (4.10) 
By the same token one can write 

K t+1 = ^D- l K t {P t + KjD^K t )- x P t 

to replace (|4.9|) . 

Some properties of the algorithm are summarized in the next proposition. 

Proposition 4.3. For Algorithm ^. 1\ the following hold for all t. 

(a) D t > and (D t ) u < (£„)«. 

(b) Rt is invertible. 

(c) If Hq is of full column rank, so is H t . 

(d) H t Hj < E . 

(e) //So = HqHq + Do then the algorithm stops. 

(f) The objective function decreases at each iteration. More precisely, let So,* 
be the solution of the first partial minimization with data E t = Y*(H t , D t ,Qt). 
Then 

D(E \\H t+1 Hj +1 ) - £>(E 1 \H t Hj) = - (p(E t+l | |E t ) + D(E 0)i ||E ,t+i)) ■ 

(g) The limit points (H,D) of the algorithm satisfy the relations 

H = (So — HH T )D~ 1 H, 
D = A(E - HH T ). 

Proof (a) This follows from Remark 13.81 

(b) Use the identity I - Hj {H t Hj + D t )- l H t = (I + D^ 1 Ht)' 1 and the 
assumption So > 0. 

(c) Use the assumption So > 0, (a), and (b). 

(d) Again from Remark 13.81 and the construction of the algorithm as a combi- 
nation of the two partial minimization problems. 

(e) This is a triviality upon noticing that one can take Rt = I in this case. 

(f) It follows from a concatenation of Lemma \'S. 41 and Proposition 13.71 Notice 
that we can express the decrease as the sum of two I-divergences, since the 
Pythagorean law holds for both partial minimizations. 

(g) We consider Algorithm 14.21 first . Assume that all variables converge. Then, 
from (|4~9|) . the limit points K,P,D satisfy the relation K = Y, Q D^ 1 K(P + 
K T D~ 1 K)~ 1 P. Postmultiplication by P^ 1 (P + D~ X K) yields, after rear- 
ranging terms, K = (So — KP~ 1 K T )D~ 1 K. Let now Q be a square root of P 
and H — KQ~ X to get the first relation. The rest is trivial. □ 



10 



4.2 Proof of Proposition 12.21 



Let -Do an d H be arbitrary and perform one step of the algorithm to get 
matrices D x and Hi. It follows from Proposition ^. 31 that D(Y> Q \\HiHj + D X ) < 
D{T, \\H Hj + D ). Moreover, H x Hj < E and D x < A(E ). Hence the search 
for a minimum can be confined to the set of matrices (H, D) satisfying HH T < 
Eo and D < A(Eo). Next, we claim that it is also sufficient to restrict the 
search for a minimum to all matrices (H, D) such that HH T + D > el for some 
sufficiently small e > 0. Indeed, if the last inequality is violated, then HH T + D 
has an eigenvalue less than e. Write the Jordan decompositions HH T + D = 
UAU T , and let Ey = U T Z U. Then D(Y. \\HH T + D) = D(T,u\\A), as one 
easily verifies. Denoting by Ai the eigenvalues of HH T + D and letting an be the 
diagonal elements of we can write D(Su|A) = — \ log |S(/| + \ J2i log Ai — 
^ + i ^f. Let Xi be a minimum eigenvalue and take e smaller than the 
minimum of all <z,j, which is positive, since E is strictly positive definite. Then 
the contribution for i = iq in the summation to the divergence _D(E[/||A) is 
at least loge + 1, which tends to infinity for e — » 0. This proves the claim. 
So, we have shown that a minimizing pair (H,D) has to satisfy HH T < So, 
13 < A(Eo), and HH T + D > £/, for some e > 0. In other words we have to 
minimize the I-divergence over a compact set on which it is clearly continuous. 
This proves Proposition 12.21 



A Appendix 

For ease of reference we collect here some standard formulas for the normal 
distribution and some matrix algebra. 

A.l Multivariate normal distribution 

Let (X T , y T ) T be a zero mean normal vector with covariance matrix 

y, _ f^xx Exy\ 

\Eyjc Eyy J 

Assume that Eyy is invertible. The conditional law of X given Y is normal 
with E[X\Y] = E XY E yy y and 

Cov[A|F] = E xx - E xy E yy E yx . (A.l) 
A. 2 Partitioned matrices 

Lemma A.l. Let A, D be square matrices. Assume invertibility where required. 

(A C\ (I CD- 1S \ (A-CD- l B 0\ / J 0\ 
\B DJ \0 I J\ D J \D~ 1 B Ij' 
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[A C\ ( I 0\ (A \(I A- l C\ 

\B Dj ~ yBA^ 1 ijyo D-BA^CjyO I ) 



'A C 
B D 



(A-CD^B)- 1 
-D~ 1 B(A — CD~ 1 B)~ 1 



-{A-CD- l B)- l CD- 1 
D~ 1 B(A - CD- X B)- X CD- X -+ 



D 



-l 



Corollary A. 2. 

(D - BACy 1 = D- 1 + D~ 1 B(A~ 1 - CD^ 1 B)~ 1 CD~ 1 . 

Proof For Lemma TA-ll a check will suffice. The Corollary follows using the two 
decompositions of the Lemma with A replaced by A^ 1 and comparing the two 
expressions of the lower right block of the inverse matrix. □ 
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